Identifying Genres of Web Pages
نویسنده
چکیده
In this paper, we present an inferential model for text type and genre identification of web pages, where text types are inferred using a modified form of Bayes’ theorem, and genres are derived using a few simple if-then rules. As the genre system on the web is a complex reality, and web pages are much more unpredictable and individualized than paper documents, we propose this approach as an alternative to unsupervised and supervised techniques. The inferential model allows a classification that can accommodate genres that are not entirely standardized, and is more respectful of the actual nature of a web page, which is mixed, rarely corresponding to an ideal type and often showing a mixture of genres or no genre at all. A proper evaluation of such a model remains an open issue. Mots-clés : genre, typologies textuelles, pages web, modèle déductif-inductif, identification automatique, théorème de Bayes
منابع مشابه
An n-gram Based Approach to the Classification of Web Pages by Genre
The extraordinary growth in both the size and popularity of the World Wide Web has created a growing interest not only in identifying Web page genres, but also in using these genres to classify Web pages. The hypothesis of this research is that an n-gram representation of a Web page can be used effectively to automatically classify that Web page by genre. This research involves the development ...
متن کاملGenres In Formation? An Exploratory Study of Web Pages using Cluster Analysis
The Web is a new, large and heterogeneous community where the interaction among the users and the possibility offered by technology may modify existing genres or create new ones. In fact, most genres being borrowed from the paper world have undergone adjustments when moving on to the Web (for instance, online newspapers and online manuals). Also, there is a family of genres, which have been cre...
متن کاملCybergenre: Automatic Identification of Home Pages on the Web
The research reported in this paper is part of a larger project on the automatic classification of web pages by their genres. The long term goal is the incorporation of web page genre into the search process to improve the quality of the search results. In this phase, a neural net classifier was trained to distinguish home pages from non-home pages and to classify those home pages as personal h...
متن کاملReproduced and emergent genres of communication on the World-Wide Web
The World-Wide Web is growing quickly and being applied to many new types of communications. As a basis for studying organizational communications, Yates and Orlikowski (1992; Orlikowski and Yates, 1994) proposed using genres. They defined genres as "typified communicative actions characterized by similar substance and form and taken in response to recurrent situations" (Yates and Orlikowski, 1...
متن کاملGenre Analysis of Bookmarked Web Pages
Purpose – A total of 17 user-compiled collections of webpages, comprising 833 bookmarked links in terms of genre, are studied. The purpose of this paper is to find out whether users tend to bookmark certain web genres more than others. Genre theory helps to make sense of the different pages included in these collections, and to classify them, according to their communicative purpose and salient...
متن کامل